What is at Stake with High Stakes Testing? A Discussion of Issues and Research

نویسنده

  • GREGORY J. MARCHANT
چکیده

High stakes tests are defined as those tests that “carry serious consequences for students or educators.” The consequences from standardized achievement tests range from grade retention for school children to rewards or punitive measures for schools and school districts. The nature of standardized achievement tests used in these situations poses validity problems for the decisions. Numerous unintended negative consequences for students, teachers, curriculum, and schools have been identified. Research has yet to establish clear benefits from these high stakes practices. Therefore, with little empirical support and financial and human costs high, a costs/benefits analysis suggests that the high stakes testing bandwagon, further fueled by No Child Left Behind, needs to be carefully evaluated before it continues to roll. OHIO J SCI 104 (2):2–7, 2004 Manuscript received 5 December 2002 and in revised form 14 May 2003 (#02-29). INTRODUCTION For both advocates and opponents of the use of standardized tests in decisions regarding students, teachers, and educational policies, the answer to “what is at stake with high stakes testing?” is the same. The answer is, “everything.” In an effort to implement accountability measures for districts, schools, teachers, and even individual students, testing originally designed to provide information regarding individual student achievement and ability for diagnostic/prescriptive teaching efforts is now being used as the measuring stick for evaluating the success of students, teachers, schools, districts, and even states. With important decisions resting on the results of certain test scores, it is important to know how well the scores reflect the quality of learning and education. It is also important to consider whether decisions based on these tests tend to reflect accurate interpretations and result in best practice. Even a potentially useful tool for education may be considered inappropriate if its use routinely results in harm to children. This article began as a review of the current research to explore the results of high stakes testing; of particular interest was its affect on student learning. Surprisingly and unfortunately the impact of high stakes testing on student achievement has not been investigated. Therefore, this article reviews the research and concerns addressed in the literature regarding high stakes testing. DEFINITION OF HIGH STAKES TESTING A position statement issued by the American Educational Research Association issued in July of 2000 described high-stakes testing as follows: Many states and school districts mandate testing programs to gather data about student achievement over time and to hold schools and students accountable. Certain uses of achievement test results are termed “high stakes” if they carry serious consequences for students or educators. Schools may be judged according to the school-wide average scores for their students. High school-wide scores may bring public praise or financial rewards; low scores may bring public embarrassment or heavy sanctions. For individual students, high scores may bring a special diploma attesting to exceptional academic accomplishment; low scores may result in students being held back in grade or denied a high school diploma. The statement then identified the 1999 Standards for Educational and Psychological Testing as guidelines for high-stakes testing efforts. The guidelines include protection against high-stakes decisions based on a single test, full disclosure of likely negative consequences of high-stakes testing programs, alignment of the test and the curriculum, opportunities for remediation for those who fail, appropriate attention to language differences and disabilities. High-stakes tests are usually national or state-wide standardized achievement tests. If a test is “standardized” it has set rules for administration, such that everyone taking the test receives the same exact directions and has the same restrictions of time and resources. Achievement tests are usually for one specific grade level and designed to create a distribution of scores. Popular national standardized achievement tests are the Terra Nova and the Stanford-9. Many states have taken up the costly task of developing their own state achievement tests aligned with their state’s standards. Some of these tests were developed in conjunction with national test makers and share items. The SAT is not an achievement test, but an aptitude test designed to predict college achievement; however, because of its influence on college admissions decisions, it is also considered a high-stakes test. THE NATURE OF STANDARDIZED ACHIEVEMENT TESTS Most standardized achievement tests are normreferenced, in that how well an individual does on the test is based on a comparison to a large group of test takers. “Good” is relative to others at the same grade level. This is in contrast to a criterion-referenced test OHIO JOURNAL OF SCIENCE 3 G. J. MARCHANT that defines how well one does on a test based on the meeting of criteria or mastering a standard. High stakes decisions tend to involve either relative comparisons or reaching a pre-defined cut-off point. However, almost always the decision as to where the cut-off point will be is informed by norm-referenced information, such as difficulty levels of items selected or even percentile rank of a score. Such that, if a cut-off score equates to the 40 percentile, the decision makers know that approximately 40% of the test-takers will not “pass” the test. Therefore, the setting of the cut-off score is very important on highstakes tests that require passage. For example if a state like Ohio, that averages 140,000 students at each grade level, was to raise a cutoff score for a required achievement test by 5 percentiles, approximately 7,000 more children would not reach the cutoff at each grade level. There are several problems inherent in standardized achievement tests as the basis for high stakes decisions (Popham 1999). Test designers, desiring a good distribution of scores to be able to differentiate students, do not want too many items that almost everybody gets right (or wrong). If just about everybody gets an item right it does not differentiate among students. Therefore, basic skills items that are important for everyone to master (and many do master them) are unlikely to show up on the test in large numbers. Therefore, some of the most important basic skills are not given much attention in tests. Due to limitations of time, the number of items measuring any particular skill or knowledge may be too few to provide a reliable measure of a specific skill. A strength or weakness may be determined by a few good guesses or a few skipped items. Time constraints and restriction in the range and nature of the items (usually multiple-choice responses) suggest that, although an achievement test can provide some information, as a one-time paper and pencil assessment it has serious limitations in measuring the variety and scope of classroom learning.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Review of Internet-Centered Language Assessment: Origins, Challenges, and Perspectives

This article defines the origin of an internet-centered language assessment (ICLA), how ICLAs are different from the other traditional computer-oriented tests, and what uses and functions ICLAs have in different taxonomies of language testing. After a very short review of computer- oriented testing, ICLAs are defined and categorized in low-tech or high tech categories. Since low-tech tests are ...

متن کامل

Assessing Assessment Literacy: Insights From a High-Stakes Test

This study constitutes an attempt to see what Language assessment literacy (LAL) isfor three groups of stakeholders, namely LAL test developers, LAL instructors, andLAL test-takers. The perceptions of the former group were derived from the contentanalysis of the latest version of the LAL test, and those of the latter 2 groups wereassessed through a survey designed by the researcher. Participant...

متن کامل

How Much is at Stake for the Pragmatic Encroacher

Many people are saying nowadays that what you know partly depends on what practical decisions you face (e.g. Stanley 2005; Fantl and McGrath 2009). This “pragmatic encroachment” thesis usually involves two different ideas. One idea is that knowledge plays a distinctive role in practical reasoning: you can act on what you know. The other idea is that knowledge is harder to achieve when more is a...

متن کامل

High Stakes Require More Than Just Talk: What to Do About Corruption in Health Systems; Comment on “We Need to Talk About Corruption in Health Systems”

Reluctance to talk about corruption is an important barrier to action. Yet the stakes of not addressing corruption in the health sector are higher than ever. Corruption includes wrongdoing by individuals, but it is also a problem of weak institutions captured by political interests, and underfunded, unreliable administrative systems and healthcare delivery models. We ur...

متن کامل

ACADEMIC WRITING REVISITED: A PHRASEOLOGICAL ANALYSIS OF APPLIED LINGUISTICS HIGH-STAKE GENRES FROM THE PERSPECTIVE OF LEXICAL BUNDLES

Lexical bundles are frequent word combinations that commonly appear in different registers. They have been the subject of much research in the area of corpus linguistics during the last decade. While most previous studies of bundles have mainly focused on variations in the use of these word combinations across different registers and a number of disciplines, not much research has been done to e...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017